DOCUMENT RESUME 



ED 038 673 



CG 005 218 



AU'^HOR 

"^ITLE 

INSTITUTION 
PUB DATE 
NOTE 

EDPS PRICE 
DESCRIPTORS 



Archambault , Francis X., Jr. 

A Computerized Approach to Scoring Verbal Responses 
to the Torrance Tests of Creative Thinking. 

Boston Univ. , Mass. 

[70] 

11p. 

EDRS Price ME-$0.25 HC~$0,65 

=<«Computer Oriented Programs, Computers, ^Creative 
Ability, ’i'Creative Thinking, Creativity Research, 
Elementary School Students, Students, Testing, *Test 
Results, Tests, *Test Scoring Machines 



ABSTRACT 



approach 
total of 
100 in a 



This paper describes a study of a computerized 
to scoring the Torrance Tests of Creative Thinking (TTCT) . A 
153 students from grades four through seven were involved, 
developmental sample on which the computorized scoring 
procedures were developed, and a cross validation sample composed of 
the remaining 53. This research was limited to three of the seven 
subtests of the TTCT. Subjects* responses to each of the activities 
are scored for fluency, flexibility, and originality. The fluency 
score IS defined as the total number of relevant responses given; 
flexibility as the number of different clusters of responses, 
originality was scored based on three dictionaries, with originality 
weights of zero, one, and two. The step-wise multiple regression 
technique was employed to maximize the prediction of each subject’s 
score for each activity of the TTC^. The prediction of fluency was 
the most accurate. However, with some corrections, both flexibility 
and originality results were improved. It appears that creativity, as 
defined by Torrance can be judged accurately by a computer. (KJ) 






4 



U.S. DEPARTMENT OF HEALTH, EDUCATION 
& WELFARE 
OFFICE OF EDUCATION 
THIS DOCUMENT HAS BEEN REPRODUCED 
EXACTLY AS RECEIVED FROM THE PERSON OR 
ORGANIZATION ORIGINATING IT. POINTS OF 
VIEW OR OPINIONS STATED DO NOT NECES- 
SARILY REPRESENT OFFICIAL OFFICE OF EDU- 
CATION POSITION OR POLICY 



A COMPUTERIZED APPROACH TO 
SCORI^'G VERBAL RESPONSES TO THE 
TORRANCE TESTS OF CREATIVE THINKING 



Francis X, Archarabault, Jr. 



Boston University 



ro 

sO 

CO 

rr\ 





UJ 



Qt) 

O 




Since the last decade when Guilford (1950) called attention to the 
virtual neglect of the concept of creativity by American researchers, there 
has been an enormous expansion of interest and research in the nature of 
this higher mental process. A myriad of problems and controversies have 
surrounded work in the area of creativity, but one of the most pressing 
issues continually has been the search for valid and reliable means of 
measuring creative performance. 

The recent publication of the Torrance Tests of Creative Thinking 
(Torrance, 1966) in many respects may be regarded as a breakthrough in 
the area of creativity measurement. Based on nearly nine years of research 
and development by Torrance and his colleagues, the tests represent a 
pioneering venture in that they provide the researcher and educational 
practitioner with a functional instrument for measuring creative potential 
in children, adolescents, and adults. In spite of the relatively high 
level of development of the Torrance instruments, certain technical problems 
related to levels of training on the part of the scorers may act as a 
deterrent to their widespread use. At least one reviewer (Hoepfner, 1967) 
has called attention to these problems and also has suggested that the time 
required to'' score the test battery may be a relatively long affair. These 




shortcomings may be dismissed, however, by using a computer to score the 
verbal responses to the TTCT for, unlike humans, the computer functions as 
a perfectly reliable judge which does not suffer from fatigue or lapses 
of attention. Moreover, the computer might perform this service with savings 
of both time and money. 

To determine the effectiveness of such a computerized approach a 
sample of 153 pupils from grades 4, 5i 6, and 7 in sd.x Central New York 
State public school systems was employed. These 153 subjects were randomly 
assigned to a developmental sample of size 100, on which the computerized 
scoring procedures were developed, and a cross-validation sample composed 
of the remaining 53 subjects, (Hosier, 1951)- 

Each of these subjects was administered the TTCT . Verbal Form A . but 
the present research dealt solely with the open-ended responses to three of 
the seven activities or subtests included in the battery. The activities 
considered were the Ask and Guess subtests (Activities 1, 2, and 5) in which 
subjects ask questions about a drawing and make guesses about the causes 
and consequences of a pictured event. 

The subjects responses to each of these activities are scored by 
human judges for Fluency, Flexibility, and Originality. Fluency is, according 
to Tori’ance, the total number of relevant responses given for each activity; 
Flexibility is the number of different categories of responses or the number 
of shifts in response emphasis for each of the subtests; and Originality is a 



-3- 



measure of the infrequence of each response. Ihe Originality score foi' each 
activity is the sum of the Originality scores for each of the individual 
responses. 

Using the scoring procedure set forth in the Directions Manual and 
Scoring Guide of the TTCT four trained human Judges (Archambault, 1969 ) 
scored the responses of the 153 subjects. I'he separate judge scores v/ere then 
pooled to obtain criterion measures against which the performance of the 
computerized approach could be guaged. The pooled reliabilities of these 
judges (Winer, 1962, pp. 124-132) are shown in Table I, As evident from the 
table, the reliabilities are all extremely high, with the possible exception 
of Activity 3» Originality, 

To perform the computerized scoring of the data it was first necessary to 

transcribe the responses of each subject into machine I’eadable form. This 

was accomplished by keypunching the responses on standard IBM cards, one 

response to a card. Since no corrections in spelling, punctuation, grammar, 

etc., were made on the original copy, the keypunched data were an exact 
» 

duplicate of the responses given in the test booklets. The actual, scoring of 
the test was performed by Fisher's ( 1968 ) SC0HT5CT program, a system consisting 
of a main program and nine subroutines currently operating under the IBM 36 O 
OS system. In using the Fisher program two separate scoring strategies were 
employed, some tines in concert. The first strategy was modeled directly 
after the manual scoring procedure developed by Torrance. The second involves 



the use of various actuarial measures which have proven valuable in related 



Table 1 



RELIABILITY ESTIMATES FOR FOUR JUDGES FOR 
FI.UENCI, FLEXIBILITY, AND ORIGINALITY OF 
ACTIVITIES 1, 2, and 5 OF THE TORRANCE 



TESTS 


OF Ci?EATIVE THINKING, VERBAL 
USING ANALYSIS OF VARIANCE 


FORM_A 


Total Sample 


Developmental 

Sample 


Cross-Validation 

Sample 


Activity 1 , Fluency 


.99 


.99 


.99 


Activity 1 , Flexibility 


.98 


.98 


.98 


Activity 1 , Originality 


.81 


.81 


.79 


Activity 2 , Fluency 


,.95 


.96 


.95 


Activity 2 , Flexibility 


.93 


.93 


.93 


Activity 2 , Originality 


.80 


.84 


.71 


Activity 3 i Fluency 


.93 


.94 


.91 


Activity 3 i Flexibility 


c92 


.93 


.90 


Activity 3 » Originality 


.66 


.73 


.52 



o 

ERIC 



-4- 



research by a number of investigators (Page and Paulus, I 968 ; Marcotte, 1969; 
McMamuSy 1968). Since the responses were judged at separate times for 
Fluency, Flexibility, and Originality and since the scoring strategy used is 
dependent on whether Fluency, Flexibility, or Originality is being assesed 
the method used for each of these will be described separately. 

As mentioned previously, the Fluency score for each activity is defined 
as the total number of relevant responses given. It was hypothesized that the 
Fluency score could be determined without assessing the relevance of the 
individual responses, and that, because of this, simple actuarial measures could 
be used to predict Fluency. Following this hypothesis, students' responses 
were reduced by SCORTXT to a series of counts or frequency scores on a variety 
of variables, a listing of which is given in Figure I. These variables were 
then used in a step-wise multiple regression analysis to predict the Fluency 
score. 

The Flexibility score for each activity is defined as the number of 
different clusters of responses or the number of shifts in response emphasis. 

For each activity of the TTCT, Torrance has isolated categories into which 
the respons s might fall. Twenty- two such categories have been isolated 
for Activity I, while for Activities 2 and 3, 21 Flexibility clusters have 
been determined. For each of these categories a dictionary of entries to 
be used in the computerized scoring procedure was built. The dictionaries 
were constructed by analyzing the model responses given by Torrance for 
key words and phrases and then isolating synonyms of these key words and phrases 

o 

ERIC 



Number of Question Marks 
Number of Commas 



0 



Number of Periods 

Number of Words of Length One 

Number of Words of Length Two 

Number of Words of Length Three 

Number of Words of Length Pour 

Number of Words of Length Five 

Number of Words of Length Six 

Number of Words of Length Seven 

Number of Words of Length Eight 

Number of Words of Length Nine 

Number of Words of Length Ten 

Number of Words 

Number of Sentences 

Number of Paragraphs 

Average Word Length 

Average Sentence Length 

Average Paragraph Length 

Standard Deviation of Word Length 

Standard Deviation of Sentence Length 

Standard Deviation of Paragraph Length 

Third Moment of Word Length 

Fourth Moment of Word Length 

FIGUPE I 

ACTUARIAL VARIABLES INCLUDED IN PREDICTION EQUATIONS 
FOR FLUENCY, FLEXIBILITY, AND ORIGINALITY 




- 5 - 



in Rop;et*s International Thesaurus (1962) and, Soule Dictionary of ^n/^lj-sh 
Synonyms (I966). "^he responses ofthe students were then analyzed by SCCRTXT 
performing a word/phrase lookup to determine how many categories were used. 

In addition, since high correlations were found between some of the actuarial 
measures and the Flexibility criteria, the variables listed in Figure ^ were 
again used in the analysis. Ihese data, both the category counts and the 
actuarial scores, were used in the multiple regression analysis to predict the 
Flexibility scores. 

For scoring Originality three dictionaries v/ere constructed, based on 
the possible Origixiality weights which the response might receive. The first 
dictionary consisted of all zero weight entries listed in the scoring manual 
developed by Torrance along with the synonyms of these entries extracted from 
the Flexibility dictionaries already constructed. A similar procedure was 
followed for the construction of the second dictionary comprised of entries 
for vfhich the Originality weights were one. The remaining Flexibility entries 
were then included in the Originality dictionary whose entries had weights of 
tv/o. This procedure was followed for each of the t'nree Activities. As with 
Flexibility, scores the actuarial variables were used in the development 
of prediction equations. 

As indicated previously, the step-wise multiple regression technique was 
employed to maximize the prediction of each subject's scores for each Activity 

V 

of the TTCT. Since nine scores were predicted for each individual, that is, 




a Fluency, Flexibility, and Originality score for each of the three Activities, 



- 6 - 



nine separate analyses were performed yielding nine different prediction 
equations. The results of these analyses are summarized on Table 2. 

Of the three scores that would be predicted for each activity it was 
hypothesized that the prediction of Fluency would be the most accurate. The 
results summarized in Table 2 support this hypothesis. That the multiple-R's 
would be so high had not been expected, however, since no scheme for the deter- 
mination of the appropriateness of the responses was included in the scoring 
procedure. Similarly, the size of Mult-R's obtained in the prediction 
of both Flexibility and Originality were much higher than had been anticipated. 

For the prediction of Flexibility and Originality, it was hypothesized 
that the variable ’’category counts” would be the most important predictor, 
since the counts were derived in accordance with Torrance’s scoring norms. 
However, this was true only for the prediction of the Activity 1, Originality 
scores. For the prediction of the Flexibility scores of Activities 1 and 2 
and the Originality score of Activity 2, "category counts” was the sixth best 
predictor; for Activity 3, Flexibility,, it was the tv/elfth best predictor; 
and for Activity Originality, the variable was not entered until the 24tb 
step of the regression analysis. A number of explanations might be given for 
these results, but the explanation advanced earlier by Dieter Paulus (i.e., 
that Fluency is a necessary condition for Flexibility and Originality) appears 
the. most appropriate. 

In cross validation, the multiple-R’s for Fluency held up very well, but 
sizeable shrinkage was found for the multiple-R's of the Flexibility and 



o 



Table 2 



SUHiyiARY OF RESULTS OF STEP-WISE 
MULTIPLE REGRESSION ANALYSIS 
BOTH DEVELOPMENTAL AND CROSS-VALIDATED 



Cross-Validated 

Multiple-R Multiple-R Multiple-R 

Developmental (N=100) Cross-Validated (Nss 53) Correlated for 

Attenuation (N=53) 



Criterion 






Activity 1, 


Fluency 


.97’*”*' 


Activity 1, 


Flexibility 


.91’*”*' 


Activity 1, 


Originality 


. 93 ** 


Activity 2, 


Fluency 


. 93 ** 


Activity 2, 


Flexibility 


, 8 ?** 


Activity 2, 


Originality 


. 83 ** 


Activity 3» 


Fluency 


. 93 ** 


Activity 3» 


Flexibility 


. 83 ** 


Activity 3» 


Originality 


. 91 ** 



39 ** 


.90’“* 


71 ** 


. 72 ** 




. 83 ** 


88 ** 


. 90 ** 


68 ** 


. 71 ** 




. 89 ** 


. 88 ** 


. 92 ** 


, 36 ** 


. 39 ** 


, 72 ** 


,99*’*' 



Significant at .01 level 



I 



- 7 - 

Originality dimension. However, when adjustments were made for the lack of 
perfect reliability in the criteria (i.e., the so-called ’’correction for 
attenuation") significant increases in the correlations were found for both 
of these dimensions. Moreover, if fewer predictors were used in the development 
of the regression equations, as seems appropriate from the results obtained, 
the correlation found in cross validating the results would have been higher. 

It appears, then, that creativity, as defined by Torrance, can be judged 
accurately by a computer. Further, it appears that the use of a computer to 
score open-ended responses to other standardi^zed tests may be appropriate and 
should be investigated. 




ERIC 



REFERENCES 



Archambault, F*- X. A Computerized Approach to Scoring Verbal Responses to the 

Torrance Tests of Creative Thinking; * Unpublished Doctoral Dissertation, 
University of Connecticut, 1969* 

Fisher, G. A. The SCORTXT program for the analysis of natural language. 

Ur, .published manuscript: Bureau of Educational Research, University 

of Connecticut, I968. 

Guilford, J. P. Nature of human intelligence . New York: McGraw-Hill, 1967« 

Hoepfner, R. Review of Torrance tests of tests of creativity. Journal of 
Educational Measurement , 196? 1 191-193 • 

McManus, J. F. Computer evaluation of college history examinations by actuarial 

strategies. Unpublished Doctoral dissertation. The University of Connecticut 

1968. 

Marcotte, D. R. A computerized contingency analysis of content graded essays. 

Unpublished Doctoral dissertation. The University of Connecticut, I969. 

Mosier, C. I. The need and means of cross-validation. Educational and 
Psycholog:ical Measurement . 1951 » H» 5-28. 

Page, E. B. & Paulus, D. H. The analysis of essays by computer . Final Report. 
Project No. 6— I518, Contract No. OEC—I6— 001318—1214. U. S. Department 
of Health, Education and Welfare. Storrs, Conn.: The University of 

Connecticut, I968 (mimeo.). 

Roget, P. (ed.). Roget's International Thesaurus . New York: Crowell, I962. 

Soule, R. Soule's dictionary of English synonyms . Boston: Little , Brown, & 

Co., 1966. 

Torrance, E. P. Torrance Tests of creative thinking . Princeton, New Jersey: 
Personnel Press, I966. 

Winer, B. J. Statistical principles in experimental design . New York: 

McGraw-Hill, 1962. 



o 

ERIC 



